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Abstract 

As a significant factor in urban planning, traffic forecasting and prediction 
of epidemics, modeling patterns of human mobility draws intensive atten- 
tion from researchers for decades. Power-law distribution and its variations 
are observed from quite a few real-world human mobility datasets such as 
the movements of banking notes, trackings of cell phone users' locations and 
trajectories of vehicles. In this paper, we build models for 20 million trajec- 
tories with fine granularity collected from more than 10 thousand taxis in 
Beijing. In contrast to most models observed in human mobility data, the 
taxis' traveling displacements in urban areas tend to follow an exponential 
distribution instead of a power-law. Similarly, the elapsed time can also be 
well approximated by an exponential distribution. Worth mentioning, anal- 
ysis of the interevent time indicates the bursty nature of human mobility, 
similar to many other human activities. 

Keywords: 

Human mobility, Urban mobility, GPS data, Exponential distribution 



1. Introduction 

The emergence of location tracking devices (e.g., GPS navigator and mo- 
bile devices), and location-based services (LBS, e.g., Foursquare, Yelp check- 
in and Google places) provide unprecedented opportunities to study human 
mobility patterns from trillions of trails and footprints, which are of great 
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significance in urban planning [l[ , traffic forecasting jij , marketing campaign 
[3j, prediction of epidemics |4J and designing of mobile network protocols [5j. 

Researchers observe that people's daily activities (e.g., commuting be- 
tween home and workplace, going shopping, and going vacation) follow re- 
producible mobility patterns @, 0, @] by studying trackings of cell phone 
users' locations. Similar results are also discovered from GPS trackings 
0, 3 E3, QElEi, wireless network traces ' 



based services 15 
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check-ins from location- 
16| and even movements of banking notes [Ti 



Be specific, Brockmann et al. [17| observe that the distribution of dis- 



placements between consecutive reported sightings of the banking notes fol- 
lows a power-law, and they conclude that human travel behavior can be 
described by Levy walks with heavy-tailed pause time. Similarly, [loj and 
jij observe that human trajectories data from GPS trackings could be ap- 
proximated by Levy flights. And researchers even notice obvious Levy flights 
based on mobility patterns from trails of animals 18|, |19|. Despite the pre- 



vailing Levy walks, Gonzlez et al. jfjj discover existence of spatio-temporal 
regularity in human movements, indicating people are very likely to return 
to a few frequently visited locations. A high confidence in predicting human 
movements is found due to the underlying reproducibility of human mobility 
patterns by Song et al [7J. The authors also propose a new model to char- 



acterize human mobility patterns 20]. Han et al. [21[ demonstrate that the 



scaling law in human mobility could be explained by hierarchy of the traffic 
systems. 

However, the datasets mentioned above also have their limitations. For 
example, banking notes perhaps are deposited in banks and then withdrawn 
after a long time, or transferred for many times between consecutive obser- 
vations. As for trackings of cell-phone users, when people initiate or receive 
calls and text messages, they are probably on the way from their origins to 
destinations. Furthermore, the granularity of some datasets are coarse, and 
the trajectories are between cities that are far from each other, and the lo- 
cations recorded may have large deviations from people's actual movements. 

In this paper, we study human mobility patterns for 20 millions of tra- 
jectories extracted from more than 10,000 taxis in urban areas of Beijing. 
Comparing to the other datasets mentioned above, the granularity of the 
taxis trails are in very fine granularity. Besides, the data might also reveal 
the effects of the urban traffic network on people's movements. 

The rest of the paper is organized as follows. In Section [2] we introduce 
the method of model selection. Section [3] describes the data used in the 
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paper. We show our analysis and discuss the findings in Section |H Finally, 
we conclude in Section 



2. Preliminary 

Model selection is to identify the most appropriate model that is sup- 
ported by the actual data from candidate models. Contrary to maximizing 
fit and null hypothesis tests, model selection criteria consider both fits with 
the data and complexity of models, and enable to compare multiple models at 
the same time. There are two criteria commonly used, which are Akaike in- 



formation criterion (AIC) and Bayesian information criterion (BIC) [22l . |23 

In this paper, we mainly employ AIC to compare two models: a power-law 
y = Ax~ a and an exponential y = Be~ Xx , where A and B are normalization 
constants. The steps of model selection are shown as follows. 

1. Estimating the parameters of models using maximum likelihood method. 
The details about how to perform maximum likelihood estimates (MLE) 



of these models can refer to 24 , 25 and 26 



Calculating the AIC scores for the models. The AIC score for model 
i(i G {1, 2}) is given by 

AIQ = -2\ogL t + 2K t 

Here Lj is the likelihood in which parameters are assigned with the 
estimated values from step [T] and is the number of parameters in 
the model i. 

Determining the best models. The Akaike weights can be considered 
as relative likelihoods being the best model for the observed data. Let 

AIC min = min {AIC,} 
<e{l,2} 

A, = AICi - AIC rnin , % e {1, 2} 
Then, the Akaike weights are represented by 

-A,/2 

Wi = — 2 — ,«e{i,2} 

So the model corresponding with the largest Akaike weight should be 
selected as the best one. 
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3. Data description 

To explore the urban movements, we use two GPS data sets, which were 
generated by over 10,000 taxies in Beijing, China, from Oct. 1st, 2010 to Dec. 
31st, 2010(£>i) and from Oct. 1st, 2008 to Nov. 30th, 2008(D 2 ) respectively. 
The GPS data from every taxi were collected at about 1-minute intervals. 
In addition to location I (longitude, latitude) and instantaneous velocity v, 
operational status s (with passengers or without passengers) of taxis was 
also captured from GPS equipment at the same time. Thus, the GPS data 
for a taxi k at time t can be denoted by a tuple (A;, l,v, s,t). When a taxi 
was carrying customers, it offered the proxy to understand human mobility 
patterns in urban areas. 

From both data sets, tens of millions of human trajectories can be ex- 
tracted. More specifically, according to operational status collected, we can 
identify when and where customers got into and got off the same taxi. There- 
fore, a trajectory can be uniquely represented by (k,l ,toJD,tD), which 
means that the customers departed from the origin Iq at the time to to the 
destination Id at the time to with the aid of the taxi k. The tuples with the 
time t between to and to for the taxi k were associated with the trajectory. 
Here the displacement AL and the elapsed time AT for the trajectory can 
be calculated as follows: 

Al = \l D -lo\ 
AT = \t D -t \ 

If the next trajectory of the taxi k is (k, lo, to, Id, to), the interevent time r 
that is the duration without passengers can be computed by 

t = \to — t D \ 

However, there were some abnormal tracks that need to be excluded from 
the results. For example, when AT < 1 min and AT > 120 min, the trajec- 
tory should be considered invalid because passengers seldom readily spend 
too short or too long time on taking taxis. Finally, we derive 12,028,929 
trajectories in D\ and 9,942,697 ones in D 2 as shown in Tabled! It is worthy 
to note that our analyses are mainly based on Di whereas the D 2 intends to 
be compared with D 1 to detect some changes of trends. 
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Table 1: The numbers of trajectories and taxis in both datasets 







Number of trajectories 


Number of taxis 




Oct. 2010 


3,932,107 


11,686 




Nov. 2010 


3,754,405 


11,636 




Dec. 2010 


4,342,417 


11,562 




Total 


12,028,929 


11,686 




Oct. 2008 


5,111,144 


10,695 


Nov. 2008 


4,831,553 


10,671 




Total 


9,942,697 


10,695 



4. Statistical results and explanations 

In this section, we first investigate the displacements between origins and 
destinations. Then the elapsed time of travels is analyzed and the correlation 
between displacement and elapsed time is revealed. Finally, the duration with 
no passengers (i.e., interevent time) is studied, which suggests human travel 
demands. 

4-1. Displacement 

For trajectories, the pairs of origin and destination reflect the purposes 
of human movements directly. As we know, a displacement is the distance of 
a line segment connecting the origin and destination. Contrary to the actual 
path traversed from the origin to destination in the city, the displacement 
is not influenced by geographic constraints and artificial interference. It is 
better to characterize the human mobility in urban areas. 

Therefore, we measure the probability P(Al) with a displacement Al 
in the dataset D\. As shown in Figure (Taj P(Al) increases suddenly in 
the beginning and reaches the peak when Al is about 2 km. After that, 
it decreases dramatically. The reason resulting in the rise at first is that 
the distances of human travels are seldom very short. Moreover, there are 
approximately 98% trajectories traversing a distance of less than 20 km, also 
suggesting that most movements occurred in urban areas. Intuitively, most 
individuals seem to be more apt to wander in the neighborhood of some 
places (e.g., homes, schools, and workplaces) in daily life. Consequently, the 
statistical distribution agrees with our experiences well. 

Given the distribution of displacements, we partition it into two parts ac- 
cording to the displacement of 20km. For the first part, most displacements 
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Displacement.AIIkml Displacement^™! 

(a) (b) 

Figure 1: Displacements of trajectories, (a) The probability density function 
P(A/) obtained for studied dataset D\. The inset shows the first part of 
P(AZ) in semi-log scale. The red dashed line indicates an exponential with 
measured exponent A = 0.2329. The blue dotted line represents an exponen- 
tial with measured exponent A = 0.1698. (b) The CCDFs of displacements 
for both datasets. 



occurred in urban areas, while for the latter part, these displacements often 
occurred between urban areas and suburbs. Because of economic considera- 
tion, few people choose taxis to traverse a large displacement. It seems that 
the distribution shows different trends in the two parts. Therefore, we utilize 
the AIC mentioned in Section [2] to compare two frequently used models: a 
power-law P (AZ) ~ AZ~ a and an exponential P(Al) ~ e~ XAl for the two 
parts. The detailed results for model selection are illustrated in Table [2j 
From the table, we can conclude the distribution of displacements can be 
well fitted by an exponential distribution with an exponential cutoff because 
of W exp 3> W pow in the two parts. Moreover, the two different exponential 
fits observed in Figure [Tal also support the conclusion further. We also notice 
that the exponential exponent for the first part is larger than the one for the 
latter part. It is reasonable because people travelling large displacements by 
taxis are less sensitive to taxi fares, resulting in decreasing slightly slower 
in the tail of distribution. In addition, it is interesting that the power-law 
exponents obtained for the first part in both datasets are not far from ones 



observed in [17j (yU = 1.59) and [6] (fi = 1.75). From above results, it can be 
inferred that displacements occurred in urban areas are distributed according 
an exponential rather than a power-law. 
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Furthermore, the distributions of displacements for both datasets are plot- 
ted respectively in Figure [lb] In order to reduce errors in the tail of distribu- 
tion, we use complementary cumulative distribution function (CCDF) here. 
It appears that both distributions are almost coincident. Besides, the results 
of MLE for two data sets are very close to each other as shown in Table |2j 
These all indicate that human travel patterns have no obvious changes in 
recent years. 

Table 2: Results of model selection for displacement in both datasets. 



Part 



Model 



Power law 



First part 



MLE for a 
(95% CI)§ 
R 2 t 



1.4869 
;i.4859, 1.4880) 
0.9112 




1. 



1.5108 
5097, 1.5119) 
0.8950 




Exponential 



MLE for A 0.2329 

(95% CI) (0.2328, 0.2331) 
R 2 0.9989 

W exp t 1 



(0 



0.2403 

2401, 0.2405) 
0.9995 
1 



MLE for a 



5.1048 



Power law 



(95% CI) (5.0888, 5.1208) (5 
R 2 0.9119 



Latter part 



paw 







5.2729 
2519, 5.2939) 
0.9165 




Exponential 



MLE for A 
(95% CI) 
R 2 



W, 



exp 



0.1698 

(0.1692, 0.1705) (0. 
0.9707 
1 



0.1768 

1760, 0.1777) 
0.9732 
1 



§ A 95% confidence interval. 

t Coefficient of determination to measure the goodness of fit of a model. 
* Akaike weights representing relative likelihoods of models. 



However, the observed shape of -P(AZ) may be caused by geographic het- 
erogeneity (i.e., there are different statistical properties of human travels 
among distinct locations). We will intend to discuss the influences of ge- 
ography on human mobility below. Here five representative local areas of 
circular regions with radius 1 km are selected, including Beihang university 
(BHU), Beijing Railway Station(BRS), Beijing Capital International Airport 
(BCIA), Xidan(a business district) and a residential district. These areas of- 
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ten have distinct transport features and population density. It is appropriate 
to exploit them to investigate the discrepancies caused by geographic hetero- 
geneity. 

The five distributions of displacements initiated in these regions for the D\ 
dataset are plotted in Figure [2] separately. The obtained distributions, except 
BCIA, agree with each other very well and can be well approximated by the 
exponential fit, which also applies to the overall distribution of displacements 
as referred above. Also, most travels occurred in these areas have short 
displacements of less than 20 km. Nevertheless, movements initiated in BCIA 
often have larger displacements on average and show evident differences with 
other areas. The reason is that BCIA is located suburb of Beijing. Therefore, 
passengers who got off planes usually have to go to urban areas traversing 
long distances by taxis. At the same time, the tail of distribution appears 
to decay exponentially in the same trend with the others. In this paper we 
mainly focus on the movements in urban areas, so it can be concluded that the 
geographic heterogeneity does not affect our results on the observed human 
mobility patterns. The intrinsic travel demands in urban environments lead 
to the statistical pattern. 




20 30 40 50 

Displacement, A/( km) 



Figure 2: Distributions of displacements in different areas. The black line 
represents the exponential fit with exponent 0.2329, which applies to overall 
distribution for D\. 



It must be noted that our findings are not coincident with those obtained 
from bill notes 13] and mobile phones [6| which have demonstrated that the 
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distribution of displacements could be well approximated by a power-law or 
a truncated power-law. According to the characteristics of GPS data from 
taxis, there are probably two reasons that can be used to explain the incon- 
sistency. Firstly, the movements involved in the datasets occurred mostly 
in urban areas and had small displacements. However, in other datasets, 
people travelled around the whole country by different transportation means 
and could traverse a long distance of more than 1000 km. The travels with 
long distances often happened between cities. So the displacements in urban 
environments decay faster than those measured from other datasets. Sec- 
ondly, taxi fare increases with the distance passed by. When people intend 
to travel long distances, they may choose other public transportation means 
rather than taxis out of economic consideration. As a result, trajectories 
with long displacements may have lower likelihood to be captured by taxis. 
In summary, because of these two factors, the probability distribution of dis- 
placements in urban areas decays more quickly than those in other datasets 
and is inclined to be exponential. To what degree the two factors affect 
human travel demands in cities should be studied on more datasets further. 

In addition, ^ study the travels of only 50 taxis collected from four 
cities in Sweden. They find the distribution of trail length follows a dou- 
ble power-law, implying both intracity and intercity movements each show a 
scale property. In their results, we notice that distribution in cities still fol- 
lows a power-law in spite of economic effects. Compared with their dataset, 
our datasets are more comprehensive and cover the whole city fully. Also, 



through analyzing GPS traces of taxis in Lisbon, 13j illustrate that trip dis- 
tance can be well represented by an exponential distribution. It is worthy 
to note that the exponential parameter they obtained (A = 0.26) is not far 
from ours. [11] consider GPS data of private cars in Florence urban areas. 
It is observed that total distance of daily round trips for each vehicle agrees 
with an exponential distribution very well. They also identify vehicle stop 
positions and discover single trip length distribution deviates from an expo- 
nential and favors a power law gradually when trip leng th becomes longer. 



In summary, the results of urban mobility in [13[ and [ll[ are consistent with 
ours. Thus, it can be conjectured that the phenomenon may not happen 
accidentally and exist in urban areas of cities widely. 

4-2. Elapsed time 

Elapsed time AT means the time that passengers spend on travelling 
from their origins to destinations. Given the origin and destination, the 
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elapsed time may be influenced by many factors such as current transporta- 
tion contexts, length of path chosen and habits of driving, etc. We compute 
the elapsed time of trajectories in both datasets. The distribution of elapsed 
time from the dataset D\ is shown in Figure |3al Similar to displacements, the 
distribution of elapsed time increases when AT < 7 min, and then decreases 
dramatically. There are about 98.9% trajectories in Dl and 99.5% ones in D2 
with elapsed time of less than 60 min. As we know, short elapsed time was 
mainly caused by most trips with small displacements in urban areas, while 
long elapsed time was often caused by traffic jams and long trips between 
urban areas and suburbs. Likewise, we partition the distribution into two 
parts according to the elapsed time of 60 min. In order to decide good fits, 
the method of model selection mentioned in Section [2] is used as well. The 
detailed results are listed in Table [3j illustrating that the first parts of distri- 
butions can be well approximated by exponential fits in both datasets, while 
the latter parts of distributions in D\ and are inclined to an exponential 
and a power-law respectively. Because we mainly focus on the movements 
in urban areas, it can be concluded from the first parts of distributions that 
elapsed time of trips in urban areas agrees with an exponential very well. 




20 40 60 80 100 120 iU 20 40 60 80 100 120 

Elapsed Time r A2[min) Elapsed Time(min) 



(a) (b) 

Figure 3: Elapsed time of trajectories, (a) The probability density function 
P(AT) obtained for studied dataset D\ in semi-log scale. The red dashed line 
indicates an exponential with the measured exponent A = 0.0797. The blue 
dotted line represents an exponential with measured exponent A = 0.0692. 
(b) The CCDFs of elapsed time for both datasets. 

Furthermore, in order to explore the dynamic trends of elapsed time, 
we plot the distributions of elapsed time for both datasets respectively. As 
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Table 3: Results of model selection for elapsed time in both datasets. 



Part 



Model 



Power law 



First part 



MLE for a 
(95% CI)S 
R 2 t 
W * 

' ' pow 



Pi 
1.5549 
'1.5539, 1.5559) 
0.8523 




1. 



D2 
1.7057 
7046, 1.7069) 
0.8318 




Exponential 



MLE for A 

(95% CI) 
R 2 



exp 



0.0797 

(0.0796, 0.0797) 
0.9927 
1 



(0. 



0.0912 

0911, 0.0912) 
0.9868 
1 



Power law 



MLE for a 
(95% CI) 
R 2 



Latter part 



W, 



pow 



5.4924 
(5.4602, 5.5246) 
0.9965 




(5. 



5.7775 

7243, 5.8306) 
0.9991 
1 



Exponential 



MLE for A 0.0692 

(95% CI) (0.0688, 0.0696) (0. 
R 2 0.9984 



W, 



< xp 



0.0727 
0720, 0.0735) 
0.9907 




§ A 95% confidence interval. 

' Coefficient of determination to measure the goodness of fit of a model. 
* Akaike weights representing relative likelihoods of models. 
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shown in Figure l3bj it seems that the distribution in D 2 drops more quickly 
than that in Di, indicating that passengers have spent more time on taking 
taxis on average. At the same time, the displacements people travelled have 
not changed too much as mentioned in Subsection 14.11 Besides, we can 
see from Figure H] that the mean of elapsed time passengers spent on the 
same displacement in D\ is basically longer than one in D 2 . So it can be 
concluded that the transportation conditions in Beijing have become worse 
since 2008, which agrees with the facts very well. From Figure HJ we also 
notice that the rate of growth of elapsed time becomes slower especially when 
the displacement is larger than 20 km, which implies a higher average speed. 
It is reasonable since trips with large displacements are expected to be away 
from traffic jams in urban areas. 




°0 20 40 60 80 100 

Displacement{km) 



Figure 4: Comparison of traffic congestion status between the two datasets. 
The red/blue point represents the mean of elapsed time for certain displace- 
ment. 

4-3. Correlation between displacement and elapsed time 

The first parts of distributions of displacement and elapsed time both fol- 
low exponentials very well. In this subsection we will discuss the correlation 
between them. Note that we use the dataset Di for experiments below. The 
similar phenomena also apply to the dataset D 2 . 

For every individual movement, the displacement Al and the elapsed time 
AT have been measured. Therefore, there are a lot of trajectories with the 
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same elapsed time AT, which often correspond to different displacement Al. 
The correlation can be shown in Figure EJ From the graph, we can observe 
that the mean of displacements increases with elapsed time. The growth 
rate becomes slower gradually when elapsed time is large. It demonstrate 
that the long elapsed time was often caused by traffic congestion leading to 
smaller average speed. Especially, when AT < 40 min, the means can be 
well fitted by a linear function Al = //AT with \i = 0.3326. Here it must be 
emphasized that the relation between Al and AT is numerical approximation 
and does not hold in general. As we know from the above observations, dis- 
placements of 98% trajectories are less than 20 km and elapsed times of 95% 
trajectories are less than 40 min. Hence the relationship between exponen- 
tial exponents of the first parts of both distributions can be approximately 
derived as follows. 

25i 1 1 1 1 1 1 




Elapsed Time{min) 

Figure 5: Correlation between displacement and elapsed time. The black 
point represents the mean of displacements for certain elapsed time. The 
dashed line denotes a fit Al = fiAT with /i = 0.3326. 

Given the exponents A/, At for the first parts of distributions respectively, 
the fit can be represented by 

E(Al\AT) = fiAT 
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Hence, 



^£(A/|AT)P(AT) = ^^ATF(AT) 

AT AT 

E(Al) = nE(AT) 

Because the first parts of distributions of displacement and elapsed time are 
relevant with most trajectories and decay exponentially, we can obtain the 
expectations of both distributions approximately in the forms 

E(Al) = ±E(AT) = j- 
ai at 

So the relationship is given by 

At = A ; /i 

From the Table [2] and [3] we can acquire A^ = 0.2329, Ay = 0.0797. Also Ay 
can be caculated from relationship 0.2329 * 0.3326 ~ 0.0775. The value is 
very close to the MLE for Ay. 

4-4- Interevent time 

After carrying passengers from origins to destinations, taxis begin to wan- 
der or wait in the roads in order to seek new passengers. The interevent 
time often means the time spent on waiting for new customers. Intuitively, 
interevent time can be used to indicate degree of taxis' busyness during cer- 
tain period of time. In fact, short interevent time often means that there 
are more demands for travelling statistically. Therefore, to a large extent, it 
enables to reflect human travel demands indirectly. 

In order to characterize human travel demands, we compute the interevent 
time r from trajectories of D\. The CCDF of interevent time from dataset D\ 
is shown in Figure [6a] From the graph, we can see about 98% of interevent 
times are no more than 200 minutes. When interevent time is below 200 min- 
utes, the curve fits a power-law very well. Then, it decreases exponentially. 
The deviation from a power-law in the tail is caused by two reasons. On the 
one hand, long interevent time often occurred late at night when there were 
few travel demands. On the other, many taxi drivers stopped working to 
have a rest at midnight. Also, two different taxis are chosen randomly and 
the CCDFs of interevent time are plotted in Figure [6b] and [6c] respectively. 
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Both curves can be well approximated by power-laws and the two exponents 
are only slightly different. Moreover, we plot the distribution of power-law 
exponents of CCDFs for all taxis in Figure |6dl As shown in the figure, 
most taxis have similar power-law exponents around the mean value 1.19. It 
must be remarked that the exponent obtained is concerned with CCDF. So 
for probability density function (PDF), the exponent of distribution is very 
close to 2. 

In summary, it can be concluded that durations of taxis without passen- 
gers (inactivity) is close to be distributed according an inverse-square power- 
law, while durations of taxis with passengers(activity) is well approximated 
by an exponential. The results resemble ones discovered in many animals 
with patterns of activity and inactivity where durations of inactivity follow 
inverse-square power-law distributions approximately while durations of ac- 
tivity are fitted by exponential distributions 27J. There may be some subtle 
relations between them deserving to be studied further. 

Furthermore, we divide the taxi trajectories into two categories based 
on workday and rest day (including weekends and public holidays). Then 
we calculate the mean values of interevent time occurred at different hours, 
which are shown in Figure [7J As the graph indicates, there are more than 
30 minutes of interevent time during the time slot of 23:00-5:00 in workday 
and rest day because people often stay at home for a sleep and seldom have 
needs to travel. We also notice that the interevent time during the time slot 
is shorter in rest day than in workday, which is caused that people often 
readily go out for entertainments in rest day. At the same time, people 
usually have a rest for lunch resulting in longer interevent time during the 
time slot of 11:00-12:00. As for workday, there are intensive trips occurred 
in time slots of 7:00-9:00 and 17:00-19:00 corresponding the rush time when 
people go to work and go off work. As for rest day, there are more travel 
demands in the afternoon or evening than in the morning. Therefore, it 
can be concluded that human travels in cities have characteristic of bursts, 
behaving as a large amount of movements emerge suddenly separated by long 
periods of inactivity because of strong periodic variations of travel demands 
in cities. 

The results of interevent time demonstrate that human travel demands 
follow non-Poisson processes like many other human activities, for example, 



sending or receiving Emails, web browsing, library loans |28|, etc. In contrast 



with an exponential distribution, a heavy-tailed distribution of interevent 
time allows for events occur frequently in a relatively short period of time, 
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(c) (d) 

Figure 6: Interevent time from the dataset D\. (a) The CCDF for all taxis 
with exponent a = 1.12. (b-c) The CCDFs for two taxis with exponents a 
are 1.01 and 1.35 respectively, (d) The distribution of power-law exponents 
of CCDFs for all taxis with mean fj, = 1.19, standard deviation a = 0.26. 
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80 




Time of day(h) 

Figure 7: Mean value of interevent time for per hour in workdays and rest- 
days. 

then are inactive for a long time. So the periodic variation of travel demands 
in cities may be a part of reason to account for the results of interevent time. 

5. Conclusion and future work 

In this paper, we build models for 20 million trajectories collected from 
10 thousand taxis in urban areas in Beijing. The main contributions of this 
paper is threefold: (i) models fitting the taxis' traveling displacements in 
urban areas tend to follow an exponential distribution instead of a power- 
law, probably caused by the range of the movements and economic effects; (ii) 
similarly, the elapsed time can also be well approximated by an exponential 
distribution; (iii) bursty activities are observed in modeling the interevent 
time between the trajectories, which also help validate the bursty nature of 
human daily activities. 

As future work, we are interested to further explore the two factors that 
affect the trajectories of taxis' (i.e., the range of movements, and economic 
effects). Besides, we observe that the quality of the mobility data heavily 
relies on length of elapsed time for trajectories, and is dataset dependent. For 
example, elapsed time for movement of banking notes are normally long, and 
trajectories of short elapsed time indicate good data quality. To the opposite, 
elapsed time for trackings of cell phone users' locations is usually short, and 
thus long elapsed time indicate high quality of the trajectories. So we are 
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interested to study to what extent the limitations of the data influence the 
modeling of the distribution of displacements. Last but not least, we would 
like to explore the relationship between mobility patterns modeled from taxis 
trajectories, and mobility patterns observed in other datasets. 
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